Separating text and background in degraded document images - a comparison of global thresholding techniques for multi-stage thresholding

نویسندگان

  • Graham Leedham
  • Saket Varma
  • Anish Patankar
  • Venu Govindaraju
چکیده

Before any processing of the textual content of a document image can be performed the text must be separated from the background of the image. Several thresholding algorithms have previously been proposed and are widely used in document processing. None have been shown effective at thresholding difficult documents where the background and foreground are non-uniform. In this paper we investigate the use of three global thresholding algorithms (Otsu’s, Kapur’s entropy and Solihin’s quadratic integral ratio (QIR)) as the first stage in a multi-stage thresholding algorithm for use in degraded document images. It is concluded that Otsu’s and Kapur’s algorithms do not work well for difficult documents as they tend to over-threshold the image, thus losing much of the useful information. The QIR algorithm is more accurate in separating the foreground and background in these images, leaving a range of undecided, fuzzy, pixels for later processing in a subsequent stage.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Effective Thresholding of Ancient Degraded Manuscript Folio Images

Thresholding is an essential procedure used in image segmentation and binarization applications. In this paper, segmentation methods applied on document images for separating the text from background presents pure binarization and filtering combined with image processing algorithms. This paper describes a contrast based thresholding method for old degraded manuscript images. It is an approach f...

متن کامل

Text/ Background separation in the degraded document images by combining several thresholding techniques

Extract the text from the background is an important step in all process of document analysis and recognition. If this extraction is easy for document images of good quality by applying simple techniques of global thresholding, the images of degraded documents require a more accurate analysis and we have recourse in this case to local methods. Indeed, these latter are generally more efficient a...

متن کامل

Decompose algorithm for thresholding degraded historical document images

Numerous techniques have previously been proposed for single-stage thresholding of document images to separate the written or printed information from the background. Although these global or local thresholding techniques have proven effective on particular subclasses of documents, none is able to produce consistently good results on the wide range of document image qualities that exist in gene...

متن کامل

Degraded Document Image Binarization Techniques

Document Image Binarization is performed in the preprocessing stage for document analysis and it aims to segment the foreground text from the document background. A fast and accurate document image binarization technique is important for the ensuing document image processing tasks such as optical character recognition (OCR) and Document Image Retrieval (DIR). This research area has been studied...

متن کامل

A Novel Degraded Document Image Binarazation by using Local Thresholding Segmentation

The proposed binarization is a scheme of parting a image pixel values into two classes black as foreground and white pixels as background then the thresholding is found for well known scheme for document image binarization. In this proposed work for the decomposition of both global and local thresholding this basic thresholding value we can use further. Here the global thresholding scheme is ef...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002